Adel Abu Hashim - Oct 2020
This case study aims to help Amber Heard
By analyzing new accounts posting/ commenting against a victim of a Social Bot Disinformation/Influence Operation.
We have three main datasets:
(The datasets screaped from reddit).
- 1- A dataset with submissions & comments data (2018).
- 2- Users Data (from 2006 to 2018).
- 3- A merged dataset (submissions & comments data, users data).
- 4- Daily creation data (# of accounts created per day from 2006 to 2018)
#import dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import helpers
import matplotlib.dates as mdates
import plotly.express as px
import plotly.graph_objects as go
import re
import warnings
warnings.filterwarnings('ignore')
sb.set_style("darkgrid")
%matplotlib inline
# load data
df = pd.read_csv("cleaned_data/reddit_cleaned_2018.csv")
df_merged = pd.read_csv("cleaned_data/reddit_merged_2018.csv")
# convert to datetime
df.created_at = pd.to_datetime(df.created_at)
df_merged.created_at = pd.to_datetime(df_merged.created_at)
df_merged.user_created_at = pd.to_datetime(df_merged.user_created_at)
print(df.shape)
df.head(2);
(6993, 17)
print(df_merged.shape)
df_merged.head(2);
(6993, 24)
# Filter on banned accounts
df_banned = df_merged[df_merged['is_banned']]
print(df_banned.shape)
df_banned.head(2);
(586, 24)
len(df_banned.user_name.unique())
275
df_banned.submission_comment.value_counts()
submission 305 comment 281 Name: submission_comment, dtype: int64
Note: we only have user names for the banned accounts
# Filter on unverified accounts
df_unverified = df_merged[~df_merged['has_verified_email']]
print(df_unverified.shape)
df_unverified.head(2);
(789, 24)
len(df_unverified.user_name.unique())
495
# Filter on Accounts created in 2018
df_18 = df_merged[df_merged['user_created_at'].dt.year == 2018]
print(df_18.shape)
df_18.head(2);
(875, 24)
peak_day = '2018-07-03'
# Filter on Peak Day
df_peak = df_merged[df_merged['created_at'].dt.date.astype('str') == peak_day]
df_peak_submissions = df_peak.query("submission_comment == 'submission'")
# Filter on Peak Day For Unverified accounts
df_unverified_peak = df_unverified[df_unverified['created_at'].dt.date.astype('str') == peak_day]
df_unverified_peak_submissions = df_unverified_peak.query("submission_comment == 'submission'")
# Filter on Submissions
df_submissions = df_merged.query("submission_comment == 'submission'")
df_unverified_submissions = df_unverified.query("submission_comment == 'submission'")
colors = px.colors.qualitative.T10
fig = px.pie(df_merged.banned_unverified.value_counts().to_frame().reset_index(),
values='banned_unverified', names='index', color_discrete_sequence = colors,
title = 'Contributions of banned / unverified /others in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
df_merged.banned_unverified.value_counts().to_frame()
| banned_unverified | |
|---|---|
| others | 5618 |
| unverified | 789 |
| banned | 586 |
11.3+8.38
19.68
df_peak.banned_unverified.value_counts().to_frame()
| banned_unverified | |
|---|---|
| others | 136 |
| banned | 33 |
| unverified | 21 |
fig = px.pie(df_peak.banned_unverified.value_counts().to_frame().reset_index(),
values='banned_unverified', names='index', color_discrete_sequence = colors,
title = f'Contributions of banned / unverified /others on the peak day ({peak_day})')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
17.4+11.1
28.5
NOTE: 28.5% of peak day contributions were made by banned accounts and acoounts with unverified email address
df_submissions.banned_unverified.value_counts().to_frame()
| banned_unverified | |
|---|---|
| others | 1572 |
| banned | 305 |
| unverified | 123 |
fig = px.pie(df_submissions.banned_unverified.value_counts().to_frame().reset_index(),
values='banned_unverified', names='index', color_discrete_sequence = colors,
title = 'Submissions of banned / unverified /others in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
round(15.3+6.15,2)
21.45
NOTE: About 21.45% of 2018 submissions were made by banned and unverified accounts.
df_peak_submissions.banned_unverified.value_counts().to_frame()
| banned_unverified | |
|---|---|
| others | 18 |
| banned | 10 |
| unverified | 2 |
fig = px.pie(df_peak_submissions.banned_unverified.value_counts().to_frame().reset_index(),
values='banned_unverified', names='index', color_discrete_sequence = colors,
title = f'Submissions of banned / unverified /others on the peak day ({peak_day})')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()
df_merged.groupby('banned_unverified').sum('score')['score'].to_frame().reset_index()
| banned_unverified | score | |
|---|---|---|
| 0 | banned | 22477.0 |
| 1 | others | 217158.0 |
| 2 | unverified | 14718.0 |
fig = px.pie(df_merged.groupby('banned_unverified').sum('score')['score'].to_frame().reset_index(),
values='score', names='banned_unverified', color_discrete_sequence = colors,
title = 'Total Scores of banned/ unverified/ others in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
df_merged.groupby('banned_unverified').mean('score')['score'].to_frame().reset_index()
| banned_unverified | score | |
|---|---|---|
| 0 | banned | 38.422222 |
| 1 | others | 38.688402 |
| 2 | unverified | 18.653992 |
fig = px.pie(df_merged.groupby('banned_unverified').mean('score')['score'].to_frame().reset_index(),
values='score', names='banned_unverified', color_discrete_sequence = colors,
title = 'Average Scores of banned/ unverified/ others in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
df_submissions.groupby('banned_unverified').mean('score')['score'].to_frame().reset_index()
| banned_unverified | score | |
|---|---|---|
| 0 | banned | 69.855263 |
| 1 | others | 118.459528 |
| 2 | unverified | 67.008130 |
fig = px.pie(df_submissions.groupby('banned_unverified').mean('score')['score'].to_frame().reset_index(),
values='score', names='banned_unverified', color_discrete_sequence = colors,
title = 'Average Scores submissions of banned/ unverified/ others in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
Banned-Accounts largest scores¶
df_banned.score.describe()
count 585.000000 mean 38.422222 std 157.188792 min -53.000000 25% 1.000000 50% 3.000000 75% 25.000000 max 2847.000000 Name: score, dtype: float64
# Filter on largest scores
df_scores_high = df_banned.sort_values('score', ascending=False).head(10)
fig = px.bar(df_scores_high,
x='user_name',
y=df_scores_high.score, text = df_scores_high.score, title='Banned accounts highest contribution scores')
fig.update_layout(
xaxis = dict(
title='user name',
tickmode = 'array',
tickvals = df_scores_high.user_name,
)
)
clrs = ['red' if (y > 2000) else '#5296dd' for y in df_scores_high.score]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
df_scores_high.user_name
5748 fakeposter77 2589 CJ105 6629 88MPH1 4766 klaudiaschulz 1790 Tony_montana96 1793 Tony_montana96 2013 Series000 4322 vonmark955 2244 horny_fuckers 4341 GWhisperer Name: user_name, dtype: object
NOTE: "fakeposter77": this user contributions got the hieghest scores.
(Since this user is banned, we have no user information but, we can further investigate his contributions)
df_crack = df_banned[df_banned.user_name == 'fakeposter77'].sort_values('created_at')
df_crack.head()
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5748 | t3_9w7ir0 | /r/celebnsfw/comments/9w7ir0/amber_heard_in_th... | Amber Heard in The Informers | NaN | r/celebnsfw | 2018-11-11 21:11:41 | Neutral | Neutral | 2847.0 | NaN | ... | True | True | True | NaN | NaN | NaT | banned | banned | NaN | NaN |
1 rows × 24 columns
df_crack.submission_comment.value_counts()
submission 1 Name: submission_comment, dtype: int64
NOTE: "fakeposter77": only one submission in 11-11-2018.
df_crack.text.value_counts()
Amber Heard in The Informers 1 Name: text, dtype: int64
Banned-Accounts minimum scores¶
# Filter on minimum scores
df_scores_low = df_banned.sort_values('score').head(20)
fig = px.bar(df_scores_low,
x='user_name',
y=df_scores_low.score,
text = df_scores_low.score)
fig.update_layout(title_text='Banned accounts minimum contribution scores', title_x=0.5, title_y=0.2)
fig.update_layout(
xaxis = dict(
side='top',
title='user name',
tickmode = 'array',
tickvals = df_scores_low.user_name,
)
)
clrs = ['red' if (y < -50) else '#5296dd' for y in df_scores_low.score]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_scores_low.user_name
6384 DrinkingKratom 6948 thebloodyaugustABC 5671 ThisWasPrettyAnoying 6661 bigbrycm 5672 ThisWasPrettyAnoying 4467 ___0047532899532___ 6886 ElegantCyclist 3613 Ride0rDie2020 5898 KOOL69THUGBOI 3615 Ride0rDie2020 3614 Ride0rDie2020 5372 FuckingObsessed 3620 Ride0rDie2020 2173 Gaultier55 5562 GoldPisseR 4701 JesusSkywalkered 5744 Maudraum 3621 Ride0rDie2020 6480 NecroHexr 6660 boobsRlyfe Name: user_name, dtype: object
df_pp = df_banned[df_banned.user_name == 'DrinkingKratom']
df_pp.sort_values('created_at', inplace=True);
df_pp
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6384 | t1_ebzvfsh | /r/crappycontouring/comments/a72jn4/personally... | or she could not wear makeup | t1_ebzssvn | r/crappycontouring | 2018-12-17 19:31:53 | Neutral | Neutral | -53.0 | comment | ... | True | True | True | NaN | NaN | NaT | banned | banned | NaN | NaN |
1 rows × 24 columns
df_pp.submission_comment.value_counts()
comment 1 Name: submission_comment, dtype: int64
# df_amazing.permalink[15599]
Contributions of Banned accounts in 2018¶
fig = px.pie(df_merged.is_banned.value_counts(),
values='is_banned', names=['others', 'banned'], color_discrete_sequence = colors,
title = 'Contributions of Banned accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
banned_contr_prop = df_banned.shape[0] * 100 /df_merged.shape[0]
banned_contr_prop
print('The percentage % of 2018 contributions made by Banned accounts: ', round(banned_contr_prop, 2))
The percentage % of 2018 contributions made by Banned accounts: 8.38
print('Total banned accounts contributions in 2018: ', df_banned.shape[0])
Total banned accounts contributions in 2018: 586
print('Total banned accounts comments in 2018: ',df_banned.query(" submission_comment == 'comment' ").shape[0])
Total banned accounts comments in 2018: 281
df_banned.query(" submission_comment == 'comment' ").text.value_counts().head(5)
[GIFV link](https://i.imgur.com/LRhwNvS.gifv)\r\n\r\n---\r\n\r\n_^I ^am ^a ^bot. ^[FAQ](https://www.reddit.com/r/livven/wiki/gifv-bot) ^// ^[code](https://github.com/Livven/GifvBot)_ 3 Unzip her and have her bend over slightly, bracing against the window. Put on a show for everyone going by... 2 r/celebrityasses 2 /r/Amber_Heard. 2 I guess. But I still don't like the idea of being the enemy to stop the enemy,\n\nBut it is very true that being a good boy doesn't grt you cookies. 1 Name: text, dtype: int64
fig = px.pie(df_submissions.is_banned.value_counts(),
values='is_banned', names=['others', 'banned'], color_discrete_sequence = colors,
title = 'Submissions of Banned accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
banned_sub_prop = df_banned.query(" submission_comment == 'submission' ").shape[0] * 100\
/df_merged.query(" submission_comment == 'submission' ").shape[0]
print('The percentage % of 2018 submissions made by Banned accounts: ', banned_sub_prop)
The percentage % of 2018 submissions made by Banned accounts: 15.25
print('Total banned accounts submissions in 2018: ', df_banned.query(" submission_comment == 'submission' ").shape[0])
Total banned accounts submissions in 2018: 305
df_banned.query(" submission_comment == 'submission' ").text.value_counts().head(5)
Amber Heard 142 Pinned to Amber Heard on Pinterest 8 Amber Heard - The Informers 8 Amber Heard - London Fields 5 Amber Heard & Amanda Seyfried 3 Name: text, dtype: int64
txt = 'Amber Heard'
df_banned[df_banned['text'] == txt][['user_name', 'subreddit', 'created_at']]
# Nothe that we don't have data on banned users except for their names
| user_name | subreddit | created_at | |
|---|---|---|---|
| 1790 | Tony_montana96 | r/Celebs | 2018-01-12 19:01:12 |
| 1791 | Tony_montana96 | r/celebritylegs | 2018-06-15 09:27:36 |
| 1792 | Tony_montana96 | r/CelebrityArmpits | 2018-07-13 20:17:59 |
| 1793 | Tony_montana96 | r/Celebs | 2018-07-30 20:22:36 |
| 2013 | Series000 | r/goddesses | 2018-01-23 18:41:17 |
| ... | ... | ... | ... |
| 6840 | tinjaw314w | r/Celebs | 2018-12-26 05:30:05 |
| 6869 | MobileDon | r/Celebs | 2018-12-27 01:35:04 |
| 6870 | ChemicalDuty | r/BeautifulFemales | 2018-12-27 01:35:31 |
| 6915 | Celebrity-Pix | r/celebrities | 2018-12-29 01:25:59 |
| 6976 | BlondeLegion | r/AltBlonde | 2018-12-31 15:40:50 |
142 rows × 3 columns
df_banned[df_banned['text'] == txt][['user_name', 'subreddit', 'created_at']].groupby('user_name').count().sort_values('subreddit', ascending=False).head(11)
| subreddit | created_at | |
|---|---|---|
| user_name | ||
| vonmark955 | 13 | 13 |
| naughtytwd | 11 | 11 |
| horny_fuckers | 10 | 10 |
| demonicmax56 | 9 | 9 |
| vonjobi951 | 9 | 9 |
| Pm-me-your-ass-photo | 7 | 7 |
| OrganicSatisfaction | 5 | 5 |
| CJ105 | 4 | 4 |
| chrisdeli | 4 | 4 |
| Tony_montana96 | 4 | 4 |
| PussySlayingVirgin | 4 | 4 |
df_banned[df_banned['user_name'] == 'vonmark955'].subreddit.value_counts()
r/Celebsnudess 10 r/Amber_Heard 2 r/CelebSexScenes 2 r/CelebrityNipples 2 r/Celebhub 1 r/celebnsfw 1 r/Celebs 1 r/CelebrityButts 1 Name: subreddit, dtype: int64
df_banned[df_banned['user_name'] == 'vonmark955'].created_at.dt.date.value_counts()
2018-08-27 6 2018-08-15 3 2018-07-24 2 2018-10-05 2 2018-07-31 1 2018-08-28 1 2018-09-29 1 2018-07-22 1 2018-09-28 1 2018-09-25 1 2018-07-25 1 Name: created_at, dtype: int64
df_banned[df_banned['user_name'] == 'naughtytwd'].subreddit.value_counts()
r/Celebs 11 r/geekboners 1 Name: subreddit, dtype: int64
df_banned[df_banned['user_name'] == 'naughtytwd'].created_at.dt.date.value_counts()
2018-06-14 4 2018-06-03 3 2018-07-23 1 2018-06-05 1 2018-07-21 1 2018-07-24 1 2018-06-16 1 Name: created_at, dtype: int64
df_banned[df_banned['user_name'] == 'horny_fuckers'].subreddit.value_counts()
r/celebritylegs 9 r/Celebs 1 r/gentlemanboners 1 Name: subreddit, dtype: int64
df_banned[df_banned['user_name'] == 'horny_fuckers'].created_at.dt.date.value_counts()
2018-05-11 2 2018-04-08 1 2018-02-13 1 2018-05-23 1 2018-05-02 1 2018-05-05 1 2018-03-21 1 2018-04-16 1 2018-03-26 1 2018-03-11 1 Name: created_at, dtype: int64
txt = """Amber Heard Flashing Boobs While Cleaning Up In Her Garage"""
rod = df_banned[df_banned['text'] == txt][['user_name', 'subreddit', 'created_at']]
rod
# Nothe that we don't have data on banned users except for their names
| user_name | subreddit | created_at | |
|---|---|---|---|
| 4735 | michikoperdue | r/CelebNudes | 2018-08-01 19:14:53 |
| 4736 | michikoperdue | r/Celebsnudess | 2018-08-01 19:17:19 |
| 4737 | michikoperdue | r/celebJObuds | 2018-08-01 19:18:18 |
rod.user_name.value_counts()
michikoperdue 3 Name: user_name, dtype: int64
rod.subreddit.value_counts()
r/CelebNudes 1 r/Celebsnudess 1 r/celebJObuds 1 Name: subreddit, dtype: int64
rod.created_at.dt.date.value_counts()
2018-08-01 3 Name: created_at, dtype: int64
NOTE: "michikoperdue": this user made 3 contributions with the same text in the same time in 3 different subreddits
Banned Accounts Contributions Peaks¶
# group by date an count
banned_contributions = df_banned.groupby(df_banned.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(banned_contributions,
x='created_at', y='n_contributions')
fig.update_layout(
title={
'text': "The number of banned users contributions created on each date",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_traces(marker_color='#5296dd',
marker_line_width=.5, opacity=1, textposition='auto').update_layout()
fig.show()
banned_contributions.sort_values('n_contributions', ascending=False).head(10)
| created_at | n_contributions | |
|---|---|---|
| 84 | 2018-07-03 | 33 |
| 95 | 2018-07-24 | 19 |
| 191 | 2018-12-20 | 18 |
| 120 | 2018-08-27 | 18 |
| 93 | 2018-07-22 | 17 |
| 190 | 2018-12-19 | 12 |
| 157 | 2018-11-08 | 11 |
| 136 | 2018-09-29 | 11 |
| 113 | 2018-08-16 | 9 |
| 85 | 2018-07-04 | 9 |
# sort by n_contributions, take the top 3, then sort them by date
banned_contributions.sort_values('n_contributions', ascending=False, inplace=True)
banned_trendy = banned_contributions.head(5)
banned_trendy.sort_values('created_at', inplace=True)
fig = px.bar(banned_trendy,
x='created_at', y='n_contributions')
fig.update_layout(
title={
'text': "The number of banned users contributions on peak dates",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = banned_trendy.created_at,
)
)
clrs = ['red' if (y > 60) else '#5296dd' for y in banned_trendy['n_contributions']]
fig.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
banned_trendy
| created_at | n_contributions | |
|---|---|---|
| 84 | 2018-07-03 | 33 |
| 93 | 2018-07-22 | 17 |
| 95 | 2018-07-24 | 19 |
| 120 | 2018-08-27 | 18 |
| 191 | 2018-12-20 | 18 |
Explore unverified accounts with the largest link karma¶
link_high_users = df_unverified.sort_values('link_karma', ascending=False).user_name.unique()[:10]
df_link_high = df_unverified.sort_values('link_karma', ascending=False)\
[df_unverified.sort_values('link_karma', ascending=False)\
.user_name.isin(link_high_users)]
# Filter on largest link karma
#df_link_high = df_unverified.sort_values('link_karma', ascending=False).head(10)
fig = px.bar(df_link_high,
x='user_name',
y=df_link_high.link_karma, text = df_link_high.link_karma, title='Unverified accounts with highest link karma')
fig.update_layout(
xaxis = dict(
title='user name',
tickmode = 'array',
tickvals = df_link_high.user_name,
)
)
clrs = ['red' if (y > 1000000) else '#5296dd' for y in df_link_high.link_karma]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_unverified[df_unverified.user_name == 'Shutupvoices'].head(1))
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2665 | t3_84v4ag | /r/gentlemanboners/comments/84v4ag/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-03-16 12:41:49 | Neutral | Neutral | 166.0 | NaN | submission | amber_heard | Shutupvoices | False | False | False | False | 20758.0 | 1008564.0 | 2017-07-02 02:35:59 | unverified | others | 257 days 10:05:50 | 257.0 |
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_unverified[df_unverified.user_name == 'Nergaal'].head(1))
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5093 | t3_984rpq | /r/nottheonion/comments/984rpq/johnny_depp_accuses_ex_wife_amber_heard_of_pooing/ | Johnny Depp accuses ex wife Amber Heard of pooing in their bed | NaN | r/nottheonion | 2018-08-17 18:34:03 | Neutral | Negative | 11.0 | NaN | submission | johnny_depp_accuses_ex_wife_amber_heard_of_pooing | Nergaal | False | True | False | False | 119065.0 | 920951.0 | 2015-04-10 14:36:16 | unverified | others | 1225 days 03:57:47 | 1225.0 |
Explore unverified accounts with the largest comment karma¶
link_high_users = df_unverified.sort_values('comment_karma', ascending=False).user_name.unique()[:10]
df_comment_high = df_unverified.sort_values('comment_karma', ascending=False)\
[df_unverified.sort_values('link_karma', ascending=False)\
.user_name.isin(link_high_users)]
# Filter on largest comment karma
fig = px.bar(df_comment_high,
x='user_name',
y=df_comment_high.comment_karma, text = df_comment_high.comment_karma, title='Unverified accounts with highest comment karma')
fig.update_layout(
xaxis = dict(
title='user name',
tickmode = 'array',
tickvals = df_comment_high.user_name,
)
)
clrs = ['red' if (y > 900000) else '#5296dd' for y in df_comment_high.comment_karma]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
df_unverified[df_unverified.user_name == 'hnglmkrnglbrry'].user_created_at
6892 2017-01-30 18:40:33 Name: user_created_at, dtype: datetime64[ns]
df_unverified[df_unverified.user_name == 'hnglmkrnglbrry']
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6892 | t1_ecoi1it | /r/politics/comments/aa2auy/amber_heard_i_spok... | Did you just r/gatekeeping domestic violence? | t1_ecoff25 | r/politics | 2018-12-27 20:25:46 | Neutral | Negative | 16.0 | comment | ... | False | True | False | 995853.0 | 721.0 | 2017-01-30 18:40:33 | unverified | others | 696 days 01:45:13 | 696.0 |
1 rows × 24 columns
Account Created at: 2017-01-30
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_unverified[df_unverified.user_name == 'sysadminbj'].head(1))
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5517 | t1_e8pvhc1 | /r/movies/comments/9smlbt/amber_heard_film_london_fields_suffers_one_of_the/e8pvhc1/ | I thought that sounded familiar. | t1_e8pv4sn | r/movies | 2018-10-30 11:09:12 | Positive | Neutral | 2.0 | comment | comment | amber_heard_film_london_fields_suffers_one_of_the | sysadminbj | False | False | False | False | 632535.0 | 1073.0 | 2010-10-26 14:20:57 | unverified | others | 2925 days 20:48:15 | 2925.0 |
hnglmkrnglbrry unverified account with the highest comment karma.
Explore unverified accounts with the minimum link karma¶
# Filter on minimum link karma
df_link_low = df_unverified.sort_values('link_karma').head(20)
fig = px.bar(df_link_low,
x='user_name',
y=df_link_low.link_karma,
text = df_link_low.link_karma)
fig.update_layout(title_text='Unverified accounts with minimum link karma', title_x=0.5)
fig.update_layout(
xaxis = dict(
title='user name',
tickmode = 'array',
tickvals = df_link_low.user_name,
)
)
clrs = ['red' if (y < -60) else '#5296dd' for y in df_link_low.link_karma]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_link_low.user_name.unique()
array(['nsfwalt_100', 'EverGenius', 'windinmynuts',
'giantfatdelicousbird', 'epicimagebot', 'IWannaBangBiel',
'IamsWhoIams', 'Tedmonds30', 'juggalobobby1', 'Sonmeisterbank',
'MrGreen098', 'Ragobie0404'], dtype=object)
A lot of unverified users has a link karma of 1
'giantfatdelicousbird', 'giantfatdelicousbird',
'giantfatdelicousbird', 'epicimagebot', 'giantfatdelicousbird',
'giantfatdelicousbird', 'IWannaBangBiel', 'IWannaBangBiel',
'IWannaBangBiel', 'IWannaBangBiel', 'IWannaBangBiel',
'IamsWhoIams', 'Tedmonds30', 'juggalobobby1', 'Sonmeisterbank',
'MrGreen098', 'Ragobie0404'
Explore unverified accounts with the minimum comment karma¶
# Filter on minimum comment karma
df_comment_low = df_unverified.sort_values('comment_karma').head(20)
fig = px.bar(df_comment_low,
x='user_name',
y=df_comment_low.comment_karma,
text = df_comment_low.comment_karma)
fig.update_layout(title_text='Unverified accounts with minimum comment karma', title_x=0.5, title_y=0.1)
fig.update_layout(
xaxis = dict(
side='top',
title='user name',
tickmode = 'array',
tickvals = df_comment_low.user_name,
)
)
clrs = ['red' if (y < -30) else '#5296dd' for y in df_comment_low.comment_karma]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_unverified[df_unverified.user_name == 'murph420']
# Defending AH
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2920 | t1_dx71umk | /r/CelebrityFeet/comments/8bg153/amber_heard/d... | What did she hear? | t3_8bg153 | r/CelebrityFeet | 2018-04-11 17:43:44 | Neutral | Neutral | 2.0 | submission | ... | False | False | False | -3.0 | 9.0 | 2016-08-27 10:11:29 | unverified | others | 592 days 07:32:15 | 592.0 |
1 rows × 24 columns
df_unverified[df_unverified.user_name == 'decreased_beast']
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4519 | t1_e2ykiew | /r/celebnsfw/comments/91h1om/amber_heard/e2ykiew/ | I bet u like armpits .. i would love to lick hers | t1_e2ycjwe | r/celebnsfw | 2018-07-24 18:37:34 | Positive | Positive | -5.0 | comment | ... | False | False | False | -3.0 | 1.0 | 2017-06-17 20:51:16 | unverified | others | 401 days 21:46:18 | 401.0 |
1 rows × 24 columns
df_unverified[df_unverified.user_name == 'cozixemera']
# Defending AH
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1931 | t3_7q5eif | /r/SubredditDrama/comments/7q5eif/rharrypotter... | r/harrypotter Duel on the topic of 'domestic a... | NaN | r/SubredditDrama | 2018-01-13 16:30:05 | Neutral | Negative | 1.0 | NaN | ... | False | False | False | -2.0 | 1.0 | 2018-01-13 16:26:14 | unverified | 2018 | 0 days 00:03:51 | 0.0 |
1 rows × 24 columns
murph420 has the minimum comment karma
Explore the largest scores¶
df_unverified.score.describe()
count 789.000000 mean 18.653992 std 55.205046 min -64.000000 25% 1.000000 50% 3.000000 75% 11.000000 max 578.000000 Name: score, dtype: float64
# Filter on largest scores
df_scores_high = df_unverified.sort_values('score', ascending=False).head(10)
fig = px.bar(df_scores_high,
x='user_name',
y=df_scores_high.score, text = df_scores_high.score, title='Unverified accounts with highest contribution scores in 2018')
fig.update_layout(
xaxis = dict(
title='user name',
tickmode = 'array',
tickvals = df_scores_high.user_name,
)
)
clrs = ['red' if (y > 500) else '#5296dd' for y in df_scores_high.score]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_scores_high.user_name.head(2)
2927 ibracadabra101 5850 skwolvie41 Name: user_name, dtype: object
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_unverified[df_unverified.user_name == 'ibracadabra101'].head(1))
# Negative Submission
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2927 | t3_8c1p7w | /r/Celebs/comments/8c1p7w/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-04-13 19:03:16 | Neutral | Neutral | 578.0 | NaN | submission | amber_heard | ibracadabra101 | False | False | False | False | 47.0 | 1914.0 | 2017-03-16 17:16:16 | unverified | others | 393 days 01:47:00 | 393.0 |
with pd.option_context('display.max_colwidth', None):
display(df_unverified[df_unverified.user_name == 'skwolvie41'].head(1))
# Negative Submissions and Comments
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5849 | t3_9xspao | /r/Celebs/comments/9xspao/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-11-17 02:27:32 | Neutral | Neutral | 101.0 | NaN | ... | False | False | False | 25360.0 | 24147.0 | 2016-07-26 09:07:11 | unverified | others | 843 days 17:20:21 | 843.0 |
1 rows × 24 columns
ibracadabra101 this user has the highest score among the unverified users
Explore the minimum scores¶
# Filter on minimum scores
df_scores_low = df_unverified.sort_values('score').head(20)
fig = px.bar(df_scores_low,
x='user_name',
y=df_scores_low.score,
text = df_scores_low.score)
fig.update_layout(title_text='Unverified accounts with minimum contribution scores in 2018', title_x=0.5, title_y=0.2)
fig.update_layout(
xaxis = dict(
side='top',
title='user name',
tickmode = 'array',
tickvals = df_scores_low.user_name,
)
)
clrs = ['red' if (y < -45) else '#5296dd' for y in df_scores_low.score]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_scores_low.user_name.head(3)
3856 HardSellDude 4672 gixxerface 2481 moc_moc_a_moc Name: user_name, dtype: object
df_unverified[df_unverified.user_name == 'HardSellDude']
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3856 | t1_e1g3qfy | /r/gentlemanboners/comments/8uiuca/amber_heard... | I prefer brunettes with small tits | t3_8uiuca | r/gentlemanboners | 2018-06-28 16:34:37 | Negative | Negative | -64.0 | submission | ... | True | False | False | 16666.0 | 4836.0 | 2017-09-13 20:43:54 | unverified | others | 287 days 19:50:43 | 287.0 |
1 rows × 24 columns
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_unverified[df_unverified.user_name == 'gixxerface'].head(1))
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4672 | t1_e3duttt | /r/entertainment/comments/93iyoe/johnny_depp_claims_ex_amber_heard_punched_him/e3duttt/ | I would let her punch me in the face as many times she wants. As long as make up sex follows | t3_93iyoe | r/entertainment | 2018-08-01 00:36:29 | Positive | Neutral | -54.0 | submission | comment | johnny_depp_claims_ex_amber_heard_punched_him | gixxerface | False | False | False | False | 225.0 | 33.0 | 2017-07-04 00:21:41 | unverified | others | 393 days 00:14:48 | 393.0 |
with pd.option_context('display.max_colwidth', None):
display(df_unverified[df_unverified.user_name == 'moc_moc_a_moc'].head(2))
# Defending AH
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2480 | t1_dvdldd9 | /r/gentlemanboners/comments/82wa5c/amber_heard/dvdldd9/ | She's donating her entire divorce settlement from Depp to charity. If she's supposed to be a gold digger she's not a very good one. | t1_dvddsca | r/gentlemanboners | 2018-03-08 14:58:53 | Negative | Neutral | 161.0 | comment | ... | False | False | False | 20029.0 | 172.0 | 2016-04-19 18:49:26 | unverified | others | 687 days 20:09:27 | 687.0 |
| 2481 | t1_dzdr3xz | /r/gentlemanboners/comments/8l6x7k/amber_heard/dzdr3xz/ | Yawn, somebody always brings this bullshit up. What kind of gold digger donates their entire divorce settlement to charity?\n\nFOH with the dumb misogynist insults. | t1_dzdke3y | r/gentlemanboners | 2018-05-22 11:52:16 | Positive | Neutral | -50.0 | comment | ... | False | False | False | 20029.0 | 172.0 | 2016-04-19 18:49:26 | unverified | others | 762 days 17:02:50 | 762.0 |
2 rows × 24 columns
HardSellDude this user has the minimum score among the unverified users
Contributions of Unverified accounts in 2018¶
Unverified accounts contributions in 2018¶
fig = px.pie(df_merged.has_verified_email.value_counts(),
values='has_verified_email', names=['verified', 'unverified'], color_discrete_sequence = colors,
title = 'Contributions of Unverified accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
unverified_contr_prop = df_unverified.shape[0] * 100 /df_merged.shape[0]
print('The percentage % of 2018 contributions made by Unverified accounts: ', round(unverified_contr_prop,2))
The percentage % of 2018 contributions made by Unverified accounts: 11.28
print('Total unverified accounts contributions in 2018: ', df_unverified.shape[0])
Total unverified accounts contributions in 2018: 789
peak_day
'2018-07-03'
fig = px.pie(df_peak.has_verified_email.value_counts(),
values='has_verified_email', names=['verified', 'unverified'], color_discrete_sequence = colors,
title = f'Contributions of Unverified accounts on peak day ({peak_day})')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
print('Total banned accounts comments in 2018: ', df_unverified.query(" submission_comment == 'comment' ").shape[0])
Total banned accounts comments in 2018: 666
df_unverified.query(" submission_comment == 'comment' ").text.value_counts().head(5)
What did she hear? 3 MFK 2 What is this from? 2 r/Lordosis 2 Mfk 2 Name: text, dtype: int64
fig = px.pie(df_submissions.has_verified_email.value_counts(),
values='has_verified_email', names=['verified', 'unverified'], color_discrete_sequence = colors,
title = 'Submissions of Unverified accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
unverified_sub_prop = df_unverified.query(" submission_comment == 'submission' ").shape[0] * 100\
/df_merged.query(" submission_comment == 'submission' ").shape[0]
print('The percentage % of 2018 submissions made by Unverified accounts: ', unverified_sub_prop)
The percentage % of 2018 submissions made by Unverified accounts: 6.15
print('Total unverified accounts submissions in 2018: ', df_unverified.query(" submission_comment == 'submission' ").shape[0])
Total unverified accounts submissions in 2018: 123
df_unverified.query(" submission_comment == 'submission' ").text.value_counts().head(5)
Amber Heard 49 LONDON FIELDS Official Trailer (2018) Amber Heard, Johnny Depp Movie HD 5 Amber Heard: I Spoke Up Against Sexual Violence and Faced Our Culture's Wrath 2 amber heard lifestyle 2 Edited: Amber Heard, Gal Gadot and Margot Robbie 2 Name: text, dtype: int64
txt = 'Amber Heard'
df_unverified[df_unverified['text'] == txt][['user_name', 'user_created_at', 'submission_comment', 'created_at', 'subreddit']]
# Note that we don't have data on banned users except for their names.
| user_name | user_created_at | submission_comment | created_at | subreddit | |
|---|---|---|---|---|---|
| 2665 | Shutupvoices | 2017-07-02 02:35:59 | submission | 2018-03-16 12:41:49 | r/gentlemanboners |
| 2770 | ClementePark | 2015-08-27 01:44:37 | submission | 2018-03-26 18:43:27 | r/Celebs |
| 2805 | SilviaJu | 2018-02-14 03:47:09 | submission | 2018-03-28 21:12:21 | r/Celebs |
| 2825 | ilikeskirt | 2016-02-11 22:04:59 | submission | 2018-03-29 13:23:28 | r/SlitDresses |
| 2927 | ibracadabra101 | 2017-03-16 17:16:16 | submission | 2018-04-13 19:03:16 | r/Celebs |
| 3006 | arcane84 | 2016-01-08 05:51:38 | submission | 2018-12-09 07:39:38 | r/gentlemanboners |
| 3007 | arcane84 | 2016-01-08 05:51:38 | submission | 2018-12-09 10:25:11 | r/Celebs |
| 3626 | ScaredBite5 | 2018-06-07 21:14:20 | submission | 2018-06-07 21:25:47 | r/CelebsInTights |
| 4218 | Divafap1 | 2016-07-05 08:09:07 | submission | 2018-07-12 04:35:42 | r/Celebs |
| 4219 | Divafap1 | 2016-07-05 08:09:07 | submission | 2018-07-25 22:31:58 | r/Celebs |
| 4223 | Divafap1 | 2016-07-05 08:09:07 | submission | 2018-08-25 05:31:20 | r/Celebs |
| 4307 | Nbaslamindub | 2018-07-21 21:21:27 | submission | 2018-07-21 21:22:03 | r/Celebs |
| 5299 | LOVESPANTYHOSEDWOMEN | 2017-01-05 19:10:04 | submission | 2018-09-18 11:38:39 | r/CelebsInTights |
| 5369 | Win168k | 2018-08-03 17:35:12 | submission | 2018-10-04 18:19:49 | u/Win168k |
| 5370 | Win168k | 2018-08-03 17:35:12 | submission | 2018-10-26 18:14:26 | u/Win168k |
| 5460 | Jtondi07 | 2016-09-04 16:25:26 | comment | 2018-10-23 15:19:55 | r/pickoneceleb |
| 5463 | elect99 | 2017-04-30 09:32:16 | submission | 2018-10-23 19:27:40 | r/gentlemanboners |
| 5464 | elect99 | 2017-04-30 09:32:16 | submission | 2018-10-23 19:27:40 | r/Celebs |
| 5465 | elect99 | 2017-04-30 09:32:16 | submission | 2018-10-23 19:29:03 | r/sexycelebs |
| 5466 | DowntownHand | 2018-09-28 20:18:22 | submission | 2018-10-23 19:27:45 | r/Celebhub |
| 5467 | DowntownHand | 2018-09-28 20:18:22 | submission | 2018-10-23 19:27:51 | r/goddesses |
| 5496 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-12 15:30:08 | r/sexycelebs |
| 5497 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-12 16:13:01 | r/goddesses |
| 5498 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-15 10:53:06 | r/sexycelebs |
| 5499 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-22 10:53:19 | r/sexycelebs |
| 5500 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-12-15 18:26:03 | r/sexycelebs |
| 5591 | prettycoolu | 2018-09-12 10:31:32 | submission | 2018-11-02 11:03:10 | r/PortraitsPorn |
| 5716 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-11-10 09:16:46 | r/Celebhub |
| 5717 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-11-23 10:41:32 | r/Celebhub |
| 5718 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 13:46:12 | r/Celebhub |
| 5719 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:09:05 | r/Celebhub |
| 5720 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:09:22 | r/Celebhub |
| 5721 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:11:35 | r/Celebhub |
| 5722 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:27:53 | r/Celebhub |
| 5723 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 15:05:37 | r/Celebhub |
| 5794 | unimasop | 2018-09-20 20:55:11 | submission | 2018-11-13 02:32:12 | r/Celebs |
| 5795 | unimasop | 2018-09-20 20:55:11 | submission | 2018-11-13 02:32:16 | r/Celebhub |
| 5833 | jackslater55 | 2018-09-15 09:10:33 | submission | 2018-11-15 17:55:33 | r/Celebs |
| 5834 | jackslater55 | 2018-09-15 09:10:33 | submission | 2018-11-27 09:42:24 | r/Celebs |
| 5844 | newwavedude | 2015-01-12 08:48:57 | submission | 2018-11-16 15:33:20 | r/Celebs |
| 5849 | skwolvie41 | 2016-07-26 09:07:11 | submission | 2018-11-17 02:27:32 | r/Celebs |
| 5850 | skwolvie41 | 2016-07-26 09:07:11 | submission | 2018-12-27 11:02:09 | r/Celebs |
| 6229 | mostepsic | 2018-09-20 02:02:22 | submission | 2018-12-12 23:45:46 | r/Celebs |
| 6364 | BekerBroun | 2013-10-19 05:00:15 | submission | 2018-12-16 12:50:16 | r/gentlemanboners |
| 6681 | girlnamedmartin | 2018-02-27 02:50:42 | submission | 2018-12-23 04:04:13 | r/PrettyGirls |
| 6735 | pietten | 2016-12-27 02:15:46 | submission | 2018-12-21 07:05:47 | r/gentlemanboners |
| 6736 | pietten | 2016-12-27 02:15:46 | submission | 2018-12-21 07:07:06 | r/Celebhub |
| 6908 | corporate_monster777 | 2018-11-14 17:24:55 | submission | 2018-12-28 19:33:55 | r/gentlemanboners |
| 6909 | corporate_monster777 | 2018-11-14 17:24:55 | submission | 2018-12-28 19:34:47 | r/Celebs |
| 6910 | corporate_monster777 | 2018-11-14 17:24:55 | submission | 2018-12-30 10:30:49 | r/Celebs |
temp = df_unverified[df_unverified['text'] == txt][['user_name', 'user_created_at', 'submission_comment', 'created_at', 'subreddit']]
temp[temp.user_name.isin(temp.user_name.value_counts().head(6).index.values)]
| user_name | user_created_at | submission_comment | created_at | subreddit | |
|---|---|---|---|---|---|
| 4218 | Divafap1 | 2016-07-05 08:09:07 | submission | 2018-07-12 04:35:42 | r/Celebs |
| 4219 | Divafap1 | 2016-07-05 08:09:07 | submission | 2018-07-25 22:31:58 | r/Celebs |
| 4223 | Divafap1 | 2016-07-05 08:09:07 | submission | 2018-08-25 05:31:20 | r/Celebs |
| 5463 | elect99 | 2017-04-30 09:32:16 | submission | 2018-10-23 19:27:40 | r/gentlemanboners |
| 5464 | elect99 | 2017-04-30 09:32:16 | submission | 2018-10-23 19:27:40 | r/Celebs |
| 5465 | elect99 | 2017-04-30 09:32:16 | submission | 2018-10-23 19:29:03 | r/sexycelebs |
| 5496 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-12 15:30:08 | r/sexycelebs |
| 5497 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-12 16:13:01 | r/goddesses |
| 5498 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-15 10:53:06 | r/sexycelebs |
| 5499 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-22 10:53:19 | r/sexycelebs |
| 5500 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-12-15 18:26:03 | r/sexycelebs |
| 5716 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-11-10 09:16:46 | r/Celebhub |
| 5717 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-11-23 10:41:32 | r/Celebhub |
| 5718 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 13:46:12 | r/Celebhub |
| 5719 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:09:05 | r/Celebhub |
| 5720 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:09:22 | r/Celebhub |
| 5721 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:11:35 | r/Celebhub |
| 5722 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:27:53 | r/Celebhub |
| 5723 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 15:05:37 | r/Celebhub |
| 6735 | pietten | 2016-12-27 02:15:46 | submission | 2018-12-21 07:05:47 | r/gentlemanboners |
| 6736 | pietten | 2016-12-27 02:15:46 | submission | 2018-12-21 07:07:06 | r/Celebhub |
| 6908 | corporate_monster777 | 2018-11-14 17:24:55 | submission | 2018-12-28 19:33:55 | r/gentlemanboners |
| 6909 | corporate_monster777 | 2018-11-14 17:24:55 | submission | 2018-12-28 19:34:47 | r/Celebs |
| 6910 | corporate_monster777 | 2018-11-14 17:24:55 | submission | 2018-12-30 10:30:49 | r/Celebs |
temp[temp.user_name.isin(temp.user_name.value_counts().head(6).index.values)][temp.user_name == 'Doc-Sidious']
| user_name | user_created_at | submission_comment | created_at | subreddit | |
|---|---|---|---|---|---|
| 5716 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-11-10 09:16:46 | r/Celebhub |
| 5717 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-11-23 10:41:32 | r/Celebhub |
| 5718 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 13:46:12 | r/Celebhub |
| 5719 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:09:05 | r/Celebhub |
| 5720 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:09:22 | r/Celebhub |
| 5721 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:11:35 | r/Celebhub |
| 5722 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 14:27:53 | r/Celebhub |
| 5723 | Doc-Sidious | 2018-11-05 14:28:02 | submission | 2018-12-02 15:05:37 | r/Celebhub |
temp[temp.user_name.isin(temp.user_name.value_counts().head(6).index.values)][temp.user_name == 'Doc-Sidious'].created_at.dt.date.value_counts()
2018-12-02 6 2018-11-10 1 2018-11-23 1 Name: created_at, dtype: int64
user Doc-Sidious made 8 submissions in the same subreddit r/Celebhub ,6 of them were made in 2018-12-02
df_unverified[df_unverified['user_name'] == 'Doc-Sidious'].head(1)[[ 'user_name',
'has_verified_email', 'is_mod', 'is_gold', 'is_banned', 'comment_karma',
'link_karma', 'user_created_at', 'banned_unverified', 'creation_year',
'days_after_creation']]
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 5716 | Doc-Sidious | False | False | False | False | 99.0 | 44851.0 | 2018-11-05 14:28:02 | unverified | 2018 | 4.0 |
temp[temp.user_name.isin(temp.user_name.value_counts().head(6).index.values)][temp.user_name == 'CelebsSpot']
| user_name | user_created_at | submission_comment | created_at | subreddit | |
|---|---|---|---|---|---|
| 5496 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-12 15:30:08 | r/sexycelebs |
| 5497 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-12 16:13:01 | r/goddesses |
| 5498 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-15 10:53:06 | r/sexycelebs |
| 5499 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-11-22 10:53:19 | r/sexycelebs |
| 5500 | CelebsSpot | 2018-10-26 06:12:58 | submission | 2018-12-15 18:26:03 | r/sexycelebs |
temp[temp.user_name.isin(temp.user_name.value_counts().head(6).index.values)][temp.user_name == 'CelebsSpot'].created_at.dt.date.value_counts()
2018-11-12 2 2018-11-22 1 2018-11-15 1 2018-12-15 1 Name: created_at, dtype: int64
df_unverified[df_unverified['user_name'] == 'CelebsSpot'].head(1)[[ 'user_name',
'has_verified_email', 'is_mod', 'is_gold', 'is_banned', 'comment_karma',
'link_karma', 'user_created_at', 'banned_unverified', 'creation_year',
'days_after_creation']]
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 5495 | CelebsSpot | False | True | False | False | 250.0 | 42250.0 | 2018-10-26 06:12:58 | unverified | 2018 | 3.0 |
user CelebsSpot made 5 submissions, 4 of them in one subreddit r/sexycelebs
The account creation year of the unverified accounts contributed in 2018¶
fig = px.pie(df_unverified.creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = 'Contributions of Unverified accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
fig = px.pie(df_unverified_peak.creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = f'Contributions of unverified accounts on the peak day ({peak_day})')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
fig = px.pie(df_unverified_submissions.creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = 'Submissions of Unverified accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
df_unverified_peak_submissions
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4018 | t3_8vtmib | /r/NoFilterNews/comments/8vtmib/amber_heard_fa... | Amber Heard fans upset after she posted a raci... | NaN | r/NoFilterNews | 2018-07-03 16:33:19 | Neutral | Negative | 1.0 | NaN | ... | True | False | False | 58.0 | 33057.0 | 2013-08-24 19:00:23 | unverified | others | 1773 days 21:32:56 | 1773.0 |
| 4111 | t3_8vwekz | /r/awfuleverything/comments/8vwekz/is_it_just_... | Is it just me? (Amber Heard) | NaN | r/awfuleverything | 2018-07-03 22:20:41 | Neutral | Neutral | 19.0 | NaN | ... | False | False | False | 4587.0 | 149.0 | 2016-01-14 00:00:25 | unverified | others | 901 days 22:20:16 | 901.0 |
2 rows × 24 columns
peak_day
'2018-07-03'
Unverified Accounts made 2 submissions on the peak day 2018-07-03
Which dates had the highest contrbitions of unverified accounts?¶
# group by date an count
unverified_contributions = df_unverified.groupby(df_unverified.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(unverified_contributions,
x='created_at', y='n_contributions')
fig.update_layout(
title={
'text': "The number of contributions by users with unverified mails",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_traces(marker_color='#5296dd',
marker_line_width=1, opacity=1, textposition='auto').update_layout()
fig.show()
# sort by n_contributions, take the top 3, then sort them by date
unverified_contributions.sort_values('n_contributions', ascending=False, inplace=True)
unverified_trendy = unverified_contributions.head(5)
unverified_trendy.sort_values('created_at', inplace=True)
fig = px.bar(unverified_trendy,
x='created_at', y='n_contributions')
fig.update_layout(
title={
'text': "The number of unverified users contributions on peak dates",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = unverified_trendy.created_at,
)
)
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
unverified_trendy
| created_at | n_contributions | |
|---|---|---|
| 68 | 2018-06-06 | 22 |
| 84 | 2018-07-03 | 21 |
| 109 | 2018-08-01 | 22 |
| 188 | 2018-12-19 | 38 |
| 189 | 2018-12-20 | 28 |
fig = px.pie(df_merged.creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = 'Contributions of 2018 accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
fig = px.pie(df_peak.creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = f'Contributions of 2018 accounts on the peak day ({peak_day})')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
fig = px.pie(df_submissions.creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = 'Submissions of 2018 accounts in 2018')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
NOTE: About 12% of 2018 Submissions were made by users created in 2018.
fig = px.pie(df_peak.query("submission_comment == 'submission'").creation_year.value_counts().to_frame().reset_index(),
values='creation_year', names='index', color_discrete_sequence = colors,
title = f'Submissions of 2018 accounts on the peak day ({peak_day})')
fig.update_traces(textposition='inside', textinfo='percent+label+value')
fig.show()
df_peak.shape[0]
190
df_peak.query("submission_comment == 'submission'").shape[0]
30
190-30
160
*NOTE:** We only have 30 submissions on peak day with about 160 comments and replies
print('Total contributions in 2018 made by newly created accounts(2018): ',df_18.shape[0])
contr_prop_18 = df_18.shape[0] * 100 /df_merged.shape[0]
print('The percentage % of 2018 contributions made by newly created accounts(2018): ',round(contr_prop_18,2))
Total contributions in 2018 made by newly created accounts(2018): 875 The percentage % of 2018 contributions made by newly created accounts(2018): 12.51
print('Total comments in 2018 made by newly created accounts(2018): ', df_18.query(" submission_comment == 'comment' ").shape[0])
Total comments in 2018 made by newly created accounts(2018): 636
print('Total submissions in 2018 made by newly created accounts(2018): ', df_18.query(" submission_comment == 'submission' ").shape[0])
sub_prop_18 = df_18.query(" submission_comment == 'submission' ").shape[0] * 100\
/df_merged.query(" submission_comment == 'submission' ").shape[0]
print('The percentage % of 2018 submissions made by newly created accounts(2018): ', sub_prop_18)
Total submissions in 2018 made by newly created accounts(2018): 239 The percentage % of 2018 submissions made by newly created accounts(2018): 11.95
2018 Contributions Scores¶
Explore the largest scores¶
df_merged.score.describe()
count 6987.000000 mean 36.403750 std 170.110128 min -92.000000 25% 1.000000 50% 3.000000 75% 16.000000 max 4188.000000 Name: score, dtype: float64
# Filter on largest scores
df_scores_high = df_merged.sort_values('score', ascending=False).head(10)
fig = px.bar(df_scores_high,
x='user_name',
y=df_scores_high.score, text = df_scores_high.score, title='Accounts with highest contribution scores in 2018')
fig.update_layout(
xaxis = dict(
title='user name',
tickmode = 'array',
tickvals = df_scores_high.user_name,
)
)
clrs = ['red' if (y > 3500) else '#5296dd' for y in df_scores_high.score]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_scores_high.sort_values('score', ascending=False).user_name
579 -banned- 1134 -banned- 2426 emilyguy 369 -banned- 6936 NaughtyJessThoughts 2429 emilyguy 410 -banned- 182 -banned- 5748 fakeposter77 4424 Ezio9619 Name: user_name, dtype: object
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_merged[df_merged.user_name == 'emilyguy'])
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2426 | t3_82wa5c | /r/gentlemanboners/comments/82wa5c/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-03-08 09:30:44 | Neutral | Neutral | 3814.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 786 days 14:47:55 | 786.0 |
| 2427 | t3_842r8g | /r/gentlemanboners/comments/842r8g/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-03-13 09:09:02 | Neutral | Neutral | 442.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 791 days 14:26:13 | 791.0 |
| 2428 | t3_85j0o6 | /r/gentlemanboners/comments/85j0o6/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-03-19 12:18:13 | Neutral | Neutral | 595.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 797 days 17:35:24 | 797.0 |
| 2429 | t3_8694b7 | /r/WatchItForThePlot/comments/8694b7/amber_heard_has_an_amazing_body/ | Amber Heard has an amazing body | NaN | r/WatchItForThePlot | 2018-03-22 05:16:10 | Positive | Positive | 3165.0 | NaN | submission | amber_heard_has_an_amazing_body | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 800 days 10:33:21 | 800.0 |
| 2430 | t3_8694cl | /r/celebsnaked/comments/8694cl/amber_heard/ | Amber Heard | NaN | r/celebsnaked | 2018-03-22 05:16:21 | Neutral | Neutral | 316.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 800 days 10:33:32 | 800.0 |
| 2431 | t3_8694ge | /r/celebnsfw/comments/8694ge/amber_heard/ | Amber Heard | NaN | r/celebnsfw | 2018-03-22 05:17:00 | Neutral | Neutral | 440.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 800 days 10:34:11 | 800.0 |
| 2432 | t1_dw3ksem | /r/WatchItForThePlot/comments/8694b7/amber_heard_has_an_amazing_body/dw3ksem/ | The Informers (2008) | t1_dw3k8b6 | r/WatchItForThePlot | 2018-03-22 11:29:16 | Neutral | Neutral | 12.0 | comment | comment | amber_heard_has_an_amazing_body | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 800 days 16:46:27 | 800.0 |
| 2433 | t1_dw4zgth | /r/celebnsfw/comments/8694ge/amber_heard/dw4zgth/ | The Informers (2008) | t1_dw4xwws | r/celebnsfw | 2018-03-23 01:35:27 | Neutral | Neutral | 1.0 | comment | comment | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 801 days 06:52:38 | 801.0 |
| 2434 | t3_86sot8 | /r/gentlemanboners/comments/86sot8/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-03-24 12:22:42 | Neutral | Neutral | 158.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 802 days 17:39:53 | 802.0 |
| 2435 | t3_8c2dua | /r/Celebs/comments/8c2dua/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-04-13 20:33:21 | Neutral | Neutral | 418.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 823 days 01:50:32 | 823.0 |
| 2436 | t1_dxbiw14 | /r/Celebs/comments/8c2dua/amber_heard/dxbiw14/ | The Playboy Club (2011) | t3_8c2dua | r/Celebs | 2018-04-13 20:33:45 | Neutral | Neutral | 4.0 | submission | comment | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 823 days 01:50:56 | 823.0 |
| 2437 | t3_8gbrwm | /r/gentlemanboners/comments/8gbrwm/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-05-01 20:47:43 | Neutral | Neutral | 77.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 841 days 02:04:54 | 841.0 |
| 2438 | t3_8hb267 | /r/celebsnaked/comments/8hb267/amber_heard/ | Amber Heard | NaN | r/celebsnaked | 2018-05-05 22:51:05 | Neutral | Neutral | 147.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 845 days 04:08:16 | 845.0 |
| 2439 | t3_8hb2g4 | /r/celebnsfw/comments/8hb2g4/amber_heard/ | Amber Heard | NaN | r/celebnsfw | 2018-05-05 22:52:18 | Neutral | Neutral | 509.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 845 days 04:09:29 | 845.0 |
| 2440 | t3_8hb2jx | /r/Celebs/comments/8hb2jx/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-05-05 22:52:45 | Neutral | Neutral | 516.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 845 days 04:09:56 | 845.0 |
| 2441 | t1_dyjcjj9 | /r/Celebs/comments/8hb2jx/amber_heard/dyjcjj9/ | The Informers | t1_dyjceib | r/Celebs | 2018-05-06 14:34:19 | Neutral | Neutral | 3.0 | comment | comment | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 845 days 19:51:30 | 845.0 |
| 2442 | t3_8jvhpz | /r/gentlemanboners/comments/8jvhpz/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-05-16 14:48:15 | Neutral | Neutral | 231.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 855 days 20:05:26 | 855.0 |
| 2443 | t3_8m0z5o | /r/gentlemanboners/comments/8m0z5o/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-05-25 11:21:36 | Neutral | Neutral | 617.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 864 days 16:38:47 | 864.0 |
| 2444 | t3_8qcee6 | /r/gentlemanboners/comments/8qcee6/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-06-11 19:47:28 | Neutral | Neutral | 195.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 882 days 01:04:39 | 882.0 |
| 2445 | t3_8sc5fj | /r/gentlemanboners/comments/8sc5fj/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-06-19 20:24:11 | Neutral | Neutral | 103.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 890 days 01:41:22 | 890.0 |
| 2446 | t3_8ujtgq | /r/gentlemanboners/comments/8ujtgq/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-06-28 14:16:46 | Neutral | Neutral | 11.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 898 days 19:33:57 | 898.0 |
| 2447 | t3_8vqlv6 | /r/Celebs/comments/8vqlv6/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-07-03 08:38:02 | Neutral | Neutral | 1041.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 903 days 13:55:13 | 903.0 |
| 2448 | t3_8vzeyv | /r/gentlemanboners/comments/8vzeyv/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-07-04 06:20:16 | Neutral | Neutral | 93.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 904 days 11:37:27 | 904.0 |
| 2449 | t3_8wec1y | /r/gentlemanboners/comments/8wec1y/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-07-05 21:32:00 | Neutral | Neutral | 494.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 906 days 02:49:11 | 906.0 |
| 2450 | t3_8y7edh | /r/gentlemanboners/comments/8y7edh/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-07-12 06:42:02 | Neutral | Neutral | 119.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 912 days 11:59:13 | 912.0 |
| 2451 | t3_91b2de | /r/gentlemanboners/comments/91b2de/amber_heard_and_jessica_alba/ | Amber Heard and Jessica Alba | NaN | r/gentlemanboners | 2018-07-23 21:15:41 | Neutral | Neutral | 316.0 | NaN | submission | amber_heard_and_jessica_alba | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 924 days 02:32:52 | 924.0 |
| 2452 | t3_98f1dv | /r/gentlemanboners/comments/98f1dv/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-08-18 21:18:18 | Neutral | Neutral | 276.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 950 days 02:35:29 | 950.0 |
| 2453 | t3_9aikvm | /r/gentlemanboners/comments/9aikvm/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-08-26 20:11:26 | Neutral | Neutral | 241.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 958 days 01:28:37 | 958.0 |
| 2454 | t3_9goioy | /r/gentlemanboners/comments/9goioy/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-09-17 21:30:38 | Neutral | Neutral | 58.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 980 days 02:47:49 | 980.0 |
| 2455 | t3_9iwj13 | /r/gentlemanboners/comments/9iwj13/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-09-25 21:58:08 | Neutral | Neutral | 206.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 988 days 03:15:19 | 988.0 |
| 2456 | t3_9ligl2 | /r/Celebs/comments/9ligl2/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-10-05 02:39:33 | Neutral | Neutral | 206.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 997 days 07:56:44 | 997.0 |
| 2457 | t3_9lihq6 | /r/celebsnaked/comments/9lihq6/amber_heard/ | Amber Heard | NaN | r/celebsnaked | 2018-10-05 02:43:57 | Neutral | Neutral | 266.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 997 days 08:01:08 | 997.0 |
| 2458 | t3_9oym7g | /r/gentlemanboners/comments/9oym7g/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | 2018-10-17 13:22:22 | Neutral | Neutral | 225.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 1009 days 18:39:33 | 1009.0 |
| 2459 | t3_9t9agd | /r/Celebs/comments/9t9agd/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-11-01 13:12:56 | Neutral | Neutral | 193.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 1024 days 18:30:07 | 1024.0 |
| 2460 | t3_9xew1l | /r/Celebs/comments/9xew1l/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-11-15 20:22:16 | Neutral | Neutral | 448.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 1039 days 01:39:27 | 1039.0 |
| 2461 | t3_a6qp82 | /r/Celebs/comments/a6qp82/amber_heard/ | Amber Heard | NaN | r/Celebs | 2018-12-16 17:07:53 | Neutral | Neutral | 164.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 1069 days 22:25:04 | 1069.0 |
| 2462 | t3_a8n49w | /r/CaraDelevingne/comments/a8n49w/with_amber_heard/ | with Amber Heard | NaN | r/CaraDelevingne | 2018-12-22 18:26:05 | Neutral | Neutral | 272.0 | NaN | submission | with_amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 1075 days 23:43:16 | 1075.0 |
| 2463 | t3_aaot4i | /r/celebsnaked/comments/aaot4i/amber_heard/ | Amber Heard | NaN | r/celebsnaked | 2018-12-29 20:44:00 | Neutral | Neutral | 389.0 | NaN | submission | amber_heard | emilyguy | True | True | False | False | 26585.0 | 1836704.0 | 2016-01-11 18:42:49 | others | others | 1083 days 02:01:11 | 1083.0 |
df_merged[df_merged.user_name == 'emilyguy'].submission_comment.value_counts()
submission 34 comment 4 Name: submission_comment, dtype: int64
df_merged[df_merged.user_name == 'emilyguy'].subreddit.value_counts()
r/gentlemanboners 19 r/Celebs 9 r/celebsnaked 4 r/celebnsfw 3 r/WatchItForThePlot 2 r/CaraDelevingne 1 Name: subreddit, dtype: int64
df_merged[df_merged.user_name == 'emilyguy'].created_at.dt.date.value_counts()
2018-03-22 4 2018-05-05 3 2018-10-05 2 2018-04-13 2 2018-07-03 1 2018-08-18 1 2018-09-17 1 2018-03-19 1 2018-05-06 1 2018-05-01 1 2018-05-25 1 2018-07-23 1 2018-03-08 1 2018-07-04 1 2018-06-19 1 2018-03-23 1 2018-05-16 1 2018-12-22 1 2018-06-28 1 2018-06-11 1 2018-07-05 1 2018-08-26 1 2018-07-12 1 2018-10-17 1 2018-09-25 1 2018-11-15 1 2018-11-01 1 2018-12-16 1 2018-03-24 1 2018-12-29 1 2018-03-13 1 Name: created_at, dtype: int64
df_merged[df_merged.user_name == 'emilyguy'].shape
(38, 24)
emilyguy
Explore the minimum scores¶
# Filter on minimum scores
df_scores_low = df_merged.sort_values('score').head(20)
fig = px.bar(df_scores_low,
x='user_name',
y=df_scores_low.score,
text = df_scores_low.score)
fig.update_layout(title_text='Accounts with minimum contribution scores in 2018', title_x=0.5, title_y=0.2)
fig.update_layout(
xaxis = dict(
side='top',
title='user name',
tickmode = 'array',
tickvals = df_scores_low.user_name,
)
)
clrs = ['red' if (y < -60) else '#5296dd' for y in df_scores_low.score]
fig.update_traces(marker_color=clrs,
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
df_scores_low.user_name.head(5)
2467 MedivhTheEternal 807 -banned- 2468 MedivhTheEternal 5754 sanjuhunk 3856 HardSellDude Name: user_name, dtype: object
with pd.option_context('display.max_colwidth', None, 'display.max_columns', None):
display(df_merged[df_merged.user_name == 'MedivhTheEternal'])
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | submission_text | user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2465 | t1_dvddsca | /r/gentlemanboners/comments/82wa5c/amber_heard/dvddsca/ | She may be beautiful but shes just a high quality gold digger... screw her.. First Johnny Depp and then Elon Musk... | t3_82wa5c | r/gentlemanboners | 2018-03-08 12:06:55 | Positive | Neutral | 115.0 | submission | comment | amber_heard | MedivhTheEternal | True | False | False | False | 687.0 | 6957.0 | 2017-06-05 07:13:04 | others | others | 276 days 04:53:51 | 276.0 |
| 2466 | t1_dvdj8yw | /r/gentlemanboners/comments/82wa5c/amber_heard/dvdj8yw/ | Fair enough i guess :) | t1_dvdj7qq | r/gentlemanboners | 2018-03-08 14:20:19 | Positive | Positive | 23.0 | comment | comment | amber_heard | MedivhTheEternal | True | False | False | False | 687.0 | 6957.0 | 2017-06-05 07:13:04 | others | others | 276 days 07:07:15 | 276.0 |
| 2467 | t1_dvdlu66 | /r/gentlemanboners/comments/82wa5c/amber_heard/dvdlu66/ | I’ll have to disagree man, im a huge fan of Johnny Depp no matter what i think hes right. I am biased perhaps but i cant help it | t1_dvdllh0 | r/gentlemanboners | 2018-03-08 15:07:07 | Positive | Neutral | -92.0 | comment | comment | amber_heard | MedivhTheEternal | True | False | False | False | 687.0 | 6957.0 | 2017-06-05 07:13:04 | others | others | 276 days 07:54:03 | 276.0 |
| 2468 | t1_dvdm05m | /r/gentlemanboners/comments/82wa5c/amber_heard/dvdm05m/ | I think marrying for money is a more dishonored action | t1_dvdlwtp | r/gentlemanboners | 2018-03-08 15:09:57 | Positive | Neutral | -66.0 | comment | comment | amber_heard | MedivhTheEternal | True | False | False | False | 687.0 | 6957.0 | 2017-06-05 07:13:04 | others | others | 276 days 07:56:53 | 276.0 |
| 2469 | t1_dvdmfpp | /r/gentlemanboners/comments/82wa5c/amber_heard/dvdmfpp/ | I dont play Hearthstone, I play WoW | t1_dvdme6s | r/gentlemanboners | 2018-03-08 15:17:11 | Positive | Neutral | -5.0 | comment | comment | amber_heard | MedivhTheEternal | True | False | False | False | 687.0 | 6957.0 | 2017-06-05 07:13:04 | others | others | 276 days 08:04:07 | 276.0 |
| 2470 | t1_dvxmtip | /r/gentlemanboners/comments/85aijy/amber_heard/dvxmtip/ | I completely agree | t1_dvwjw4v | r/gentlemanboners | 2018-03-19 10:34:42 | Positive | Positive | 1.0 | comment | comment | amber_heard | MedivhTheEternal | True | False | False | False | 687.0 | 6957.0 | 2017-06-05 07:13:04 | others | others | 287 days 03:21:38 | 287.0 |
MedivhTheEternal
for i in df_merged[df_merged.user_name == 'MedivhTheEternal'].text.values:
print('-', i)
- She may be beautiful but shes just a high quality gold digger... screw her.. First Johnny Depp and then Elon Musk... - Fair enough i guess :) - I’ll have to disagree man, im a huge fan of Johnny Depp no matter what i think hes right. I am biased perhaps but i cant help it - I think marrying for money is a more dishonored action - I dont play Hearthstone, I play WoW - I completely agree
Banned / Unverified Accounts¶
Banned¶
fakeposter77 only one submission in 11-11-2018.
fakeposter77: this user contributions got the hieghest scores.
michikoperdue: this user made 3 contributions with the same text in the same time in 3 different subreddits
Unverified¶
| Accounts Creation Year | n_contributions |
|---|---|
| 2018 | 33.6% (265) |
| others | 66.4% (524) |
| Accounts Creation Year | n_submissions |
|---|---|
| 2018 | 57.7% (71) |
| others | 42.3% (52) |
| Accounts Creation Year | n_contributions |
|---|---|
| 2018 | 33.3% (7) |
| others | 66.7% (14) |
| Accounts Creation Year | n_submissions |
|---|---|
| 2018 | 57.7% (71) |
| others | 42.3% (52) |
hnglmkrnglbrry unverified account with the highest score.
A lot of unverified users has a link karma of 1
murph420 has the minimum comment karma
ibracadabra101 this user has the highest score among the unverified users
HardSellDude this user has the minimum score among the unverified users
2018 Accounts¶
| Accounts Creation Year | n_contributions |
|---|---|
| 2018 | 12.5% (875) |
| banned | 8.38% (586) |
| others | 79.1% (5532) |
| Accounts Creation Year | n_submissions |
|---|---|
| 2018 | 11.9% (239) |
| banned | 15.3% (305) |
| others | 72.8% (1456) |
| Accounts Creation Year | n_contributions |
|---|---|
| 2018 | 7.89% (15) |
| banned | 17.4% (33) |
| others | 74.7% (142) |
| Accounts Creation Year | n_submissions |
|---|---|
| 2018 | 3.33% (1) |
| banned | 33.3% (10) |
| others | 63.3% (19) |
emilyguy
MedivhTheEternal